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Abstract 


In  1978,  Merkle  and  Heilman  introduced  a  knapsack-based  public-key 
cryptosystem,  which  received  widespread  attention.  The  two  major  open 
problems  concerning  this  cryptosystem  are: 

(i)  Security:  How  difficult  are  the  Merkle-Hellman  knapsacks? 

(ii)  Efficiency:  Can  the  huge  key  size  be  reduced? 

In  this  paper  we  analyze  the  cryptographic  security  of  knapsack  problems 
with  small  keys,  develop  a  new  (non-enumerative)  type  of  algorithm  for 
solving  them,  and  use  the  algorithm  to  show  that  under  certain  assumptions 
it  is  as  difficult  to  find  the  hidden  trapdoors  in  Merkle-Hellman  knapsacks 
as  it  is  to  solve  general  knapsack  problems. 


1 .  Motivation 


To  Introduce  our  notation,  we  briefly  describe  the  Merkle-Hellman 
cryptosystem  (more  details  can  be  found  In  Merkle  and  Heilman  [1978]). 

The  published  key  is  a  list  of  n  generators  a^,  each  one  of  which  is  a 
randomly  looking  q  bit  nunber  (the  recommended  parameters  are  n  >_  100, 
q  >_  200).  To  encrypt  an  n-bit  message  X  =  x^.-.x^  the  sender  uses 

n 

the  receiver's  key  to  compute  the  cyphertext  b  =  i  x.a.,  and  transmits 

i=l  1  1 

it  over  the  insecure  communication  channel.  To  decrypt  this  cyphertext, 
the  receiver  uses  a  secret  structure  (trapdoor)  embedded  in  the  generators 
in  order  to  solve  this  knapsack  problem  by  a  shortcut  polynomial  method. 

An  eavesdropper,  who  knows  b  and  the  a^'s  but  not  the  secret  trapdoor, 
is  forced  to  use  some  general  purpose  knapsack  solving  algorithm,  and 
even  the  best  such  algorithm  (Schroeppel  and  Shamir  [19791)  is  currently 
too  slow  for  problems  of  this  size. 

The  main  practical  drawback  of  the  Merkle-Hellman  scheme  is  its 
huge  key  size  (tens  of  thousands  of  bits,  compared  with  hundreds  of  bits 
in  the  Rivest-Shamir-Adleman  [1978]  scheme  and  tens  of  bits  in  the  DES 
[1976]  scheme).  The  public  key  directory  of  large  communication  networks 
(telephone  users,  banks  or  military  installations)  can  be  extremely  long, 
and  the  many  minutes  required  to  exchange  such  keys  over  slow  telephone 
lines  can  severely  restrict  the  usefulness  of  this  public  key  cryptosystem. 
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To  reduce  the  size  of  the  key  in  a  knapsack  based  cryptosystem,  we 
can  shorten  the  generators  or  decrease  their  number.  The  first  approach 
is  impossible,  since: 

(i)  When  q  <  n,  the  decryption  function  becomes  ambiguous  since  there 
cannot  be  enough  distinct  sums  to  encode  all  the  2n  possible 
messages. 

(ii)  When  q  £  n,  the  encryption  function  is  almost  a  permutation,  and 
knapsacks  with  this  property  seem  to  be  cryptographically 
insecure  (see  Shamir  [1979]). 

(iii)  When  q  is  sufficiently  small,  the  cryptanalyst  can  prepare  a 
complete  cleartext-cyphertext  table  by  preprocessing  the 
published  key. 

The  second  approach  (which  is  mentioned  in  Merkle  and  Heilman's 
original  paper)  is  possible,  provided  we  use  multi -bit  substrings  of  the 
message  as  coefficients.  All  the  knapsack  solving  algorithms  developed 
to  date  are  based  on  the  enumeration  of  potential  solutions,  and  thus 
their  complexity  does  not  change  when  we  replace  an  equation  with  one 
hundred  0-1  coefficients  by  an  equation  with  four  25-bit  coefficients 
(which  are  the  four  quarters  of  the  100-bit  message).  The  key  size,  on 
the  other  hand,  is  reduced  by  a  factor  of  25,  which  makes  this  approach 
extremely  attractive  from  the  cryptographic  point  of  view. 

In  this  paper,  we  investigate  the  complexity  of  compact  knapsack 
problems  with  a  small  number  of  generators  and  multi-bit  coefficients. 


In  particular,  we  develop  a  new  kind  of  knapsack  solving  algorithm  which 
Is  not  based  on  the  enumeration  of  potential  solutions,  and  use  It  to 
show  that  compact  knapsacks  are  considerably  less  secure  than  their  0-1 
counterparts. 

2.  Preliminaries 

Definition:  The  set  of  n-qenerator  knapsack  problems  is  the  set  of 
equations  of  the  form 

n 

E  x.a.  *  b 
1=1  1  1 

In  which  the  generators  a^  and  the  target  value  b  are  given  natural 
numbers,  and  the  coefficients  x^  (which  must  be  integral  and  non-negative) 
are  the  unknowns.  The  set  of  compact  knapsack  problems  is  the  union  of 
these  sets  for  all  n. 

Remarks :  (1)  There  is  a  trivial  upper  bound  of  L_b/a^_J  on  ^e  va1ue 
of  each  x^,  and  thus  the  set  of  compact  knapsack  problems  is  in  NP.  An 
easy  reduction  from  set  covering  shows  that  it  is  NP-complete. 

(ii)  In  cryptographic  applications,  it  is  necessary  to  publish  a  limit 
Jt  as  part  of  the  encryption  key,  and  to  encrypt  only  messages  in  which 
0  <_  Xj  <  *.  (without  such  a  bound,  the  decryption  process  cannot  be 
unambiguous).  This  upper  bound  Is  assumed  to  be  known  to  the 


cryptanalyst,  and  can  reduce  the  size  of  his  search  space  from 

J  *  U>/a2J  L-b/a^  to  tn. 

Theorem  1 :  The  sets  of  1-,  2-  and  3-generator  knapsack  problems  are 
polynomially  solvable. 

Proof:  (1)  The  1 -generator  knapsack  problem  =  b  Is  solvable  iff 
ai  divides  b. 

(2)  The  most  general  integral  solution  of  the  equation 

x-|a1  +  x2a2  =  b 
is 

x1  =  c-j  b  +  t(a2/gcd(a1  ,a2)) 
x2  =  c2b  -  t(a1/gcd(a1 ,a2)) 

where  t  is  an  arbitrary  integral  solution  and  c-j ,  c2  are  the  coefficients 
derived  by  Euclid's  algorithm  from  the  equation 

c1(a]/gcd(a1  ,a2))  +  c2(a2/gcd(a1 ,a2))  =  1  . 

The  two  Inequalities  x^  >_  0,  x2  >_  0  define  two  rays  of  t  values,  and  the 
2-generator  knapsack  problem  is  solvable  iff  the  intersection  of  the  rays 
contains  an  integral  point. 

(3)  This  is  a  recent  result  whose  proof  is  beyond  the  scope  of  this  paper. 
The  interested  reader  is  referred  to  Kannan  and  Shamir  [1980].  Q.E.D. 
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The  complexity  of  n-generator  knapsack  problems  for  any  fixed  n  >_ 4 
Is  still  open:  to  the  best  of  my  knowledge,  no  such  set  was  ever  shown 
to  be  either  NP-complete  or  polynomlally  solvable.  The  best  published 
algorithm  for  them  takes  0(Sp)  time  both  In  the  worst  case  and  in  the 
average  case  measures,  where  p  is  the  number  of  points  in  the  search 
space. 

3.  The  New  Approach 

Definition:  Given  a  compact  knapsack  problem  K  with  a  bound  %  on  the 
values  of  the  coefficients,  max(K)  is  defined  as  the  largest  target  value 
which  can  be  represented  by  the  generators,  i.e., 

n 

max(K)  ■  i  (t-l)a.  . 
i=l  1 

Definition:  Two  compact  knapsack  problems 

n 

K:  i  x,a.  *  b  0  <  x,  <  z 

i=l  1  1  ”  1 

n 

K' :  r  x4ai  ■  b'  0  <  x4  <  t 

1-1  1  1  “  1 

are  similar  If  there  are  two  relatively  prime  numbers  w  (the  multiplier) 
and  m  (the  modulus)  such  that  m  >  max(K),  m  >  max(K'),  b'  =  wb(mod  m)  and 
for  all  1,  aj  »  wa^(mod  m). 
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Lemma  2:  Similarity  is  a  reflexive  and  symmetric  relation,  and  it  is  tran¬ 
sitive  whenever  all  the  moduli  used  are  the  same. 

Proof:  Immediate  from  the  fact  that  the  multipliers  which  are  relatively 
prime  to  m  form  a  multiplicative  group.  Q.E.D. 

Example:  The  three  compact  knapsack  problems 

Kj :  x}-]9  +  x2*31  +  x3-46  =  50  0  £  x^  <2 

l<2 :  x-j  •  32  +  x2 •  1 5  +  x3 •  1 9  =  47  0  <_  x..  <  2 

K^:  Xj-21  +  x2-13  +  x3>3  =  34  0  _<  x^  <  2 

are  similar,  since  is  obtained  from  K-j  by  multiplying  its  generators 
and  target  value  by  7  (mod  101),  is  obtained  from  K-j  by  multiplying 

its  generators  and  target  value  by  33  (mod  101),  and 
101  =  m  >  max(K])  =  19  +  31  +  46  =  96 

101  =  m  >  max(K2)  =  32  +  15  +  19  =  66 

101  =  m  >  max(Kj)  =21+13+3  =37  . 

□ 

Given  two  compact  knapsack  problems,  we  do  not  know  how  to  check 
their  similarity  or  how  to  compute  the  w  and  m  parameters  that  prove 
their  similarity  in  polynomial  time.  However,  for  our  purposes  this  will 
not  be  a  problem  since  we  will  always  know  these  parameters  from  previous 
computations. 
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The  most  Important  property  of  the  similarity  relation  is: 


Theorem  3:  If  K  and  K*  are  similar,  they  have  the  same  integral  and  bounded 
solutions. 


Proof:  Let  x, ,...,xn  be  integers  satisfying  the  equation 


n 

£  x.a,  =  b  . 
i=l  1  1 


Multiplying  this  equation  times  w  and  reducing  it  mod  m,  we  get 


n 

I  Xj(wa.)  *  wb  (mod  m)  . 
i=l  1  1 

Since  the  x^'s  are  integers,  we  can  replace  wb  and  each  wan-  by  b'  and 
al  which  are  their  reduction  mod  m: 


n 

£  x.al  =  b'  (mod  m)  . 
i=l  1  1 

n 

If  each  x,  satisfies  0  <_  x,  <  £-1  and  m  >  £  (£-l)a!,  both  sides  of  the 
1  1  i=l  1 

equation  are  integers  in  the  range  [0,m),  and  thus  the  equation  must  hold 
without  the  (mod  m)  clause: 


n 

£  x.al  =  b  . 
1=1  1  1 
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This  proves  that  any  integral  and  bounded  solution  of  the  original 
problem  is  a  solution  of  the  transformed  problem,  and  by  symmetry  the  two 
compact  knapsacks  have  identical  solutions.  Note,  however,  that  over 
the  real  numbers  or  over  unbounded  integers  the  two  equations  can  have 
very  different  sets  of  solutions.  Q.E.D. 

The  basic  idea  behind  the  new  algorithm  is  quite  simple:  Given  an 
n-generator  knapsack  problem  ,  we  search  for  n-1  additional  n-generator 
problems  Kg.-.-jK  which  are  all  similar  to  .  These  n  problems  form 
a  system  of  n  linear  equations  in  n  unknowns  x. ,  which  can  be  easily 
solved  over  the  rational s  or  the  integers  mod  £.  If  the  generated 
system  is  non-singular  and  its  unique  solution  is  integral  and  properly 
bounded,  we  are  done.  In  fact,  this  approach  is  advantageous  whenever 
the  rank  (mod  i)  of  the  system  is  larger  than  n/2,  since  the  solution 
set  of  such  a  system  contains  less  than  £n/^  points  and  their  enumeration 
is  faster  than  the  use  of  the  best  previously  published  algorithm. 

Example:  The  three  equations  in  the  previous  example  form  a  non-singular 
system  over  the  rational  numbers,  whose  unique  solution  is  x-j  =  1 ,  x2  -1 , 
x  =  0.  Instead  of  solving  the  equations  over  the  rational s,  we  can 


3 

reduce  them  mod  2: 

x1®x2  =  0 

(mod  2) 

x2®x3  =  ^ 

(mod  2) 

x^XgCXj  =  0 

(mod  2) 

and  solve  this  simplified  system 

over  GF (2) 

10 

The  formal  analysis  of  the  expected  rank  of  generated  systems  is 
not  easy.  The  set  of  modular  multiples  of  a  randomly  chosen  vector 
(a-|,...,an)  form  a  lattice  in  the  n-dimensional  cube  of  side  m,  which  is 
usually  uniform  and  isotropic.  Extensive  experimentation  has  shown  that 
when  the  original  problem  has  only  one  solution  (which  is  always  the  case 
in  cryptographic  knapsacks),  the  probability  of  n  randomly  chosen  points 
in  this  lattice  to  span  the  n-dlmensional  space  is  very  high.  A  partial 
result  that  supports  this  claim  is: 

Theorem  4:  Let  (a^,...,an)  be  an  integral  point  and  let  m  be  a  modulus 
which  is  greater  than  all  the  a^'s.  Then  for  a  randomly  chosen  integral 
w  in  [0,m),  the  probability  of  (a^,...,an)  and  (wa^mod  m) , . . .  ,wan(mod  m)) 
to  be  linearly  dependent  over  the  reals  is 

gcd(a1 ,. . . ,an)/max(a1 ,. . .  ,an)  . 

Proof  (sketch):  Without  loss  of  generality,  we  can  assume  that  a]  = 
max(ar...,an).  Let  P1#...,P  be  the  points  on  the  continuous  line  segment 

(ta1 . tan)  0<t<m 

defined  by 

t  *  (1-l)m/a1  . 

For  every  point  (ta, ,. . . ,tan)  between  Pi  and  P1+1 ,  the  point  (ta^mod  m),..., 
tan(mod  m))  Is  linearly  dependent  on  •••••*„)  over  the  reals  if  and  only 
if  the  point  P^  is  congruent  to  (0,...,0)  modulo  m.  It  is  easy  to  show 
that  exactly  gcdfa, . «n)  of  the  Pi  points  have  this  property,  and  thus 
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the  probability  of  linear  dependence  for  randomly  chosen  t  is  gcd(a-j . 

an)/aj.  Since  the  points  with  integral  values  of  t  are  equally  distributed 
among  the  various  (P^ »P^+i )  segments,  this  probability  applies  to  them  as 
well.  Q.E.D. 

Corollary:  If  gcd(a-| , . . .  ,an)  =  1  and  the  a^ 's  are  sufficiently  large,  it 
is  extremely  unlikely  that  a  randomly  chosen  transformed  equation  will  be 
linearly  dependent  on  the  original  equation. 

We  were  unable  to  extend  this  proof  technique  to  the  case  of  n  similar 
equations,  but  our  numerical  experiments  indicate  that  the  relative  frequency 
of  singular  systems  is  similar  to  that  expected  from  n  x  n  matrices  whose 
entries  are  chosen  at  random  from  [0,m).  When  m  is  large,  this  relative 
frequency  is  extremely  small  and  does  not  have  a  practical  significance 
in  cryptanalysis. 

4,  The  Algorithm 

The  main  problem  in  applying  the  method  outlined  in  the  previous  section 
is  how  to  choose  the  m  and  w  parameters  that  transform  the  original  problem 
K  into  a  similar  problem  K'.  When  m  is  a  fixed  prime  >  max(K)  and  w  varies 
between  1  and  m-1,  each  generator  aj  in  K'  ( i . e . ,  each  w-a^  (mod  m))  be¬ 
comes  uniformly  distributed  (in  a  pseudo-random  sense)  between  1  and  m-1. 

To  satisfy  m  >  max(K'),  all  these  random  variables  must  be  simultaneously 
small.  Assuming  that  their  distributions  are  independent,  the  probability 
of  this  event  can  be  estimated  as  follows: 
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Lemma  5;  Given  n  independent  and  uniformly  distributed  random  variables 

n 

a'  e  CM),  the  probability  P  that  m  >  E  U-l)  a!  is  0((tn/e)‘n). 

1  i=l  1 

Proof:  The  probability  of  n  Independent  and  uniformly  distributed  random 

n 

variables  r^  e  [0,1)  to  satisfy  E  r.  <  d  <  1  is  equal  to  the  volume  cut 
1  1-1  1  ”  n 

from  the  n-dlmenslonal  unit  cube  by  the  hyperplane  e  r,  *  d,  which  Is  dn/n! 

1=1  1 

By  scaling  up  the  range  of  the  r.'s  to  [0,m)  and  using  the  bound  d  =  m/U-1), 
we  get  P  =  l/(£-l)n-n!.  By  Stirling's  formula,  this  probability  is 
0(Un/e)'n).  Q.E.D. 

Corollary:  The  expected  number  of  useful  multipliers  w  is  0(m- Un/e)~n) , 
and  this  value  is  larger  than  1  whenever  m  has  more  than  0(n  log(«.n/e)) 
bits. 

Example:  A  knapsack  problem  with  ten  generators  and  twenty  bit  coefficients 
Is  likely  to  have  over  2  useful  multipliers  when  the  modulus  is  300  bits 
long.  However,  a  simple  trial -and-error  is  not  likely  to  find  them,  since 
they  are  scattered  in  [0,230fl)  with  a  relative  frequency  of  less  than 
2’220.  In  fact,  for  any  n  >  3  the  0(Un/e)-n)  probability  of  success  is 
even  lower  than  the  0(£-n)  probability  of  guessing  the  correct  xi  solution 
of  the  original  knapsack  problem! 


As  far  as  we  know,  there  are  no  efficient  number- theoretic  algorithms 

for  the  simultaneous  minimization  (under  modular  multi  pi ication)of  three  or 

more  natural  numbers.  The  algorithm  presented  in  this  section  is  based  on 

combinatorial  ideas,  and  it  should  be  viewed  as  a  first  attempt  at  solving 

this  problem.  Better  algorithms  (based  on  other  approaches)  undoubtedly 

exist,  and  research  in  this  direction  is  still  at  a  preliminary  stage. 

Our  algorithm  is  described  in  terms  of  a  free  parameter  s,  whose 

exact  value  will  be  determined  later.  It  attempts  to  minimize  the  various 

generators  in  n  successive  stages.  At  each  stage  1  <  k  <  n,  it  computes 

k  k 

a  set  of  s  "independent"  multipliers  w-j  ,  ....  w$  each  one  of  which  makes 
the  first  k  generators  small  under  modular  multiplication: 

V  1  .<  i  £  k  V  1  <  j  <  s  ,  Wja.j  (mod  m)  is  small. 

The  final  s  multipliers  w-|n,...,wsn  have  the  desired  property  with  respect 
to  all  the  a.j  generators. 

An  informal  description  of  the  algorithm  is: 
k=0  (initialization):  Choose  a  sufficiently  large  prime  modulus  m  and 

s  random  numbers  w® . w°  in  [0,m). 

(iteration)  :  Form  the  set  U  of  all  the  2s  sums  of  subsets  of 

k-1  v 

the  s  numbers  Wj  .  The  new  multipliers  Wj  are 

defined  as  the  s  elements  of  U-{0}  that  makes  ak 

smallest  under  modular  multiplication  (regardless 

of  what  they  do  to  the  other  generators). 

Appealing  once  more  to  the  pseudo-random  behaviour  of  modular  multi¬ 


plication,  we  can  show: 
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Theorem  6:  For  alll^i^n.l^j^s  and  1  <_  k  <_  n,  the  expected  value 
of  Wj  a^  (mod  m)  Is 


m/2 

when 

k<i 

mj/2s 

when 

k-i 

(m/2s)(s/2)k-i+1 

when 

k>i  . 

Proof:  The  value  of  the  1th  generator  does  not  affect  the  choice  of  the 
multipliers  at  stages  k=l,...i-l,  and  thus  a.  (mod  m)  fluctuates  randomly 
in  [0,m)  and  Its  expected  value  is  m/2.  At  stage  k=i,  w k.  a.  (mod  m)  is 

.L  J 

chosen  as  the  j  smallest  element  in  a  pseudo-random  set  of  2s  points  in 
[0,m) ,  and  thus  its  expected  value  is  mj/2s. 

At  stage  k*1+l,  wj+^  a^  (mod  m)  is  by  definition  the  sum  of  some  subset 
of  the  s  numbers  wj  a^  (mod  m),...,w*  a.  (mod  m),  and  thus  its  expected  size 
is  approximately 

1/2  E  (mj/2s)  £  (m/2s)(s/2)2  . 
j=l 

At  any  latter  stage,  the  subset  addition  increases  this  value  by  a  factor 
of  s/2,  and  thus  at  stage  k  >  1  the  expected  value  is  (m/2s)(s/2)k"i+1 ’  Q.E.D. 

The  key  to  the  efficiency  of  the  algorithm  is  the  sawtooth  behaviour  of 
the  expected  value  of  each  w!j  a^  (mod  m)  as  a  function  of  the  stage  k:  it 
drops  sharply  at  stage  k*1  but  Increases  only  moderately  at  later  stages 
(when  the  other  generators  are  handled). 


Example:  Let  m  be  a  300  bit  modulus,  let  n  be  4  and  let  s  be  32.  Then  the 
expected  size  (in  bits)  of  a.  (mod  m)  as  a  function  of  the  stage  k  and 
the  generator  i  is: 


i=l 

i=2 

i=3 

i=4 

k=l 

268 

300 

300 

300 

k=2 

276 

268 

300 

300 

k=3 

280 

276 

268 

300 

k=4 

284 

280 

276 

268 

For  any  multiplier  v/?  computed  at  the  last  stage  of  the  algorighm, 

J 

the  expected  value  of  the  sum  of  the  transformed  generators, 
n  n 

i  v/j  ai  (mod  m),  is  at  most  e  (m/2s) (s/2)n"1+1  %  (m/2s)(s/2)n, 
i=l  i=l 

To  satisfy  the  condition  m  >  max(K'),  the  parameter  s  must  satisfy 
m  >  U-l)(m/2s)(s/2)n  . 

By  taking  the  logarithm  of  both  sides  and  rearranging  the  terms,  we  get  the 
basic  inequality 

s  >  n  log  s  +  logU-1)  -  n  . 

For  any  given  n  and  i,  we  can  use  numeric  methods  to  solve  this  implicit 

inequality  to  find  the  smallest  s  that  satisfies  it.  To  estimate  the 
asymptotic  growth  rate  of  s,  we  can  consider  the  single-parameter  set  of 
problems  in  which  n  is  both  the  number  of  generators  and  the  length  of  each 
coefficient.  Since  log(t-l)  *  n,  the  inequality  simplifies  to  s  >  n  log  s. 
The  value  s  =  n  log  n  does  not  satisfy  the  inequality,  but  any  e-improve- 

ment  in  it  of  the  form  s  =  (1+e)  n  log  n  satisfies  it  for  all  sufficiently 


large  values  of  n: 


n  log  n  +  e  n  log  n  *  s  >  n  log  s  =  n  log  n  +  n  loglog  n  +  n  log  (1+e). 
Consequently,  the  asymptotic  behaviour  of  s  in  this  case  if  0(n  log  n). 

A  straightforward  implementation  of  the  iteration  stages  requires  0(2S) 
operations  per  stage.  A  better  implementation  can  be  obtained  by  using  the 
Schroeppel -Shamir  [1979]  algorithm  in  order  to  find  the  smallest  sums  of 
subsets  (mod  m)  in  0(2S^)  time  and  0(2S^)  space.  Further  optimizations 
can  eliminate  the  first  two  stages  (w-|,...,ws  can  be  directly  computed  in 
polynomial  time  by  the  "best  approximations"  algorithm  of  number  theory), 
and  reduce  the  complexity  of  the  remaining  stages  by  using  a  decreasing 
sequence  of  s  values  (the  final  sizes  of  most  of  the  transformed  generators 
are  unnecessarily  low  -  it  suffices  to  make  all  these  sizes  roughly  equal). 

A  problem  with  n  generators  and  n  bits  per  coefficient  contains  a 
2 

total  of  n  unknown  bits,  and  thus  the  best  previously  published  algorithm 
for  solving  it  requires 

n2 

0(2n  /2) 
s/2 

operations.  By  using  the  o(2  '  )  implementation  of  the  new  algorithm  with 
$  -  n  log  n  we  can  solve  the  problem  in  0(2^n  ^  n^)  operations,  which 
is  a  very  substantial  saving  even  for  moderate  values  of  n. 

In  practical  applications,  s  must  be  limited  to  80  or  less  in  order 
to  make  the  0(2S/^)  time  complexity  feasible.  When  i  is  small  and  s=80, 
the  inequality 
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yields  n  <_  15  as  the  practical  upper  limit  on  the  number  of  generators  our 
algorithm  can  handle.  When  n  is  slightly  decreased,  i.  can  be  considerably 
increased  since  it  occurs  only  within  a  log.  For  example,  when  n=10  s=60 
and  the  improved  algorithm  is  used,  i  can  be  as  large  as  one  million.  The 
total  number  of  unknown  x^  bits  in  such  a  10-generator  knapsack  problem  is 
200,  and  even  with  the  best  previous  algorithm  and  an  ultimate  1  picosecond 
machine,  its  solution  takes  longer  than  the  age  of  the  universe.  The  new 
algorithm,  on  the  other  hand,  can  solve  it  in  less  than  20  minutes  on  a 
conventional  1  microsecond  machine. 


5.  Consequences  of  the  Algorithm 

The  analysis  of  the  expected  behaviour  of  our  algorithm  in  the 
previous  section  was  based  on  certain  plausible  but  unproved  assumptions 
about  the  behaviour  of  the  generators  under  modular  multiplication.  So 
far  we  were  unable  to  make  this  analysis  rigorous,  and  thus  all  the  con¬ 
sequences  of  the  algorithm  mentioned  in  this  section  are  somewhat  speculative. 

For  any  fixed  m  >  3,  the  asymptotic  complexity  of  our  algorithm 
(when  the  sizes  of  a^  and  grow  to  Infinity)  is  non-polynomial,  and  thus 
it  does  not  solve  the  basic  theoretical  question  of  whether  n-generator 
knapsack  problems  are  in  P,  NP-complete,  or  somewhere  in  between.  However, 
the  efficiency  of  the  new  algorithm  for  small  values  of  n  makes  them  an 
unacceptable  security  risk  in  cryptographic  applications,  and  thus  a  large 
key  size  seems  to  be  an  Inherent  feautre  of  knapsack-based  cryptosystems. 
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One  of  the  main  cryptanalytlc  advantages  of  the  new  algorithm  is  that 
once  the  appropriate  multipliers  and  moduli  are  found  (by  preprocessing 
the  published  generators),  the  decryption  of  actual  cyphertexts  b  becomes 
extremely  fast  —  all  the  cryptanalyst  has  to  do  is  to  compute  a  vector 
of  n  modular  multiples  of  b  and  to  solve  the  resultant  system  of  linear 
equations.  This  behaviour  can  justify  weeks  or  even  months  of  pre¬ 
processing  time,  and  compares  favorably  with  other  knapsack-solving  algor 
rithms  in  which  every  decryption  attempt  is  independently  time  consuming. 

The  algorithm  strongly  indicates  that  (unintentional)  trapdoors  are 
built  into  most  uniquely  decodable  knapsack  systems,  since  the  knowledge 

of  the  n  modular  multipliers  makes  them  solvable  in  polynomial  time.  From 
the  complexity-theoretic  point  of  view,  these  multipliers  form  short  and 
easily  checkable  proofs  both  for  the  existence  and  for  the  non-existence 
of  solutions  -  a  phenomenon  that  characterizes  problems  in  a  =  NP  n  co-NP. 
Furthermore,  the  uniformity  of  these  proofs  for  all  the  knapsack  problems 
represented  by  the  same  generators  indicates  that  the  circuit  complexity  of 
these  collections  of  problems  is  polynomial. 

Another  major  cryptographic  conclusion  is  related  to  the  security  of 
the  Merkle-Hellman  cryptosystem.  To  decode  a  cyphertext  in  this  system, 
the  cryptanalyst  can  either  solve  the  knapsack  problem  or  expose  the  secret 
trapdoor  embedded  in  the  public  key.  The  NP- completeness  of  knapsack 
problems  Is  some  Indication  that  the  first  type  of  attack  is  not  likely  to 
succeed,  but  the  difficulty  of  the  second  type  of  attack  is  an  open  problem 
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about  which  almost  nothing  is  currently  known.  The  trapdoor  suggested  by 
Merkle  and  Heilman  is  based  on  the  repeated  transformation  of  one  set  of 
generators  into  a  similar  set  of  generators  via  modular  multiplications 
(whose  m  and  w  parameters  are  kept  secret).  When  the  number  of  scrambling 
stages  is  large,  the  resultant  generators  become  randomly-looking  numbers 
with  no  observable  structure  in  them.  The  main  (and  probably  the  only) 
cryptanalytic  attack  that  can  expose  the  initial  set  of  generators  is  to 
undo  the  similarity  transformations  one  at  a  time  in  reverse  order.  However, 
any  general  purpose  algorithm  for  finding  the  appropriate  m  and  w  parameters 
was  shown  in  this  paper  to  lead  to  an  efficient  knapsack-solving  algorithm, 
and  thus  the  detection  of  the  secret  trapdoor  is  not  likely  to  be  any 
easier  than  the  direct  solution  of  the  original  knapsack  problem. 
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